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ABSTRACT 

Through utilization of effective sampling procedures^ 
libraries may obtain substantial savings in terms of^ data collection 
costs, A theoretical statistical sampling model is presented and two 
types of random sampling techniques are empirically compared as to 
their effectiveness in estimating a library usage parameter. 
Implications are drawn for the possible use of these techniques in 
library setting, (Author) 



A General Statistical Model for Increasing Efx^iciency and Confidence 
in Manual Data CollectAon Systems Tln-ough S?3jripling ■ 

, ■ " 1 ■ ■ 



IV 

Terry Lied 
and 

Don L. Tolliver 



U.S. DEPARTMENTOF HEALTH, 
EDUCATION A WELPA9E 
NATIONAL INSTITUTE OF 
EDUCATION 
THIS DOCUMENT HAS BEEN REPRO- 
DUCED EXACTLY AS RECEIVED FROM 
THE PERSON OR ORGANIZATION ORJGJN- 
ATING IT. POINTSOF VIEW OR OPINIONS 
STATED DO NOT NECESSARILY REPRE- 
SENT OFFICIAL NATIONAL INSTtTUTEOF 
EDUCATION POSITION DR POLICY. 



I 



Woveniber 1973 



ABSTR/vCT: 

Through utilization of effective sampling procedures, libraries may 
obtain substantial savings in terms of data collection costs. A theoretical 
sta-uistical sampling model is presented and two types of random sam-pling 
techniques are cmpj.rically compared as to their effectiveness in estims.ting 
a librai'y usage parameter. Implications are dravn for the possible use of 
these techniques in a libro.ry setting. 
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Introduct ion 

Without question, current information on the operations of a large 
university librr.ry system is essential for its proper management and 
administration. Increr. singly, m?:^nagers of libraries are fac-ed vith the 
need for more data to better raonitor the library system. Added data 
•becomes necessary to complete internal comparisons, to observe a libraiy 
sub~system over tjjne, to corapvarc one library vith others, and to satisfy 
external requests for varied and more detailed da-ta. It is likely that 
this continued pressure for additional data will eventually overload 
manual collection routines. 

This overload may cause administrators to examine the various contin- 
uous counting procedures that have become established da.ily library 
routines. They begin to si;;arch for more efficient data gathering me'thods 
to replace traditional procedures. Often seemingly "straig?it forvjard" 
samplin.g techniques ore instituted with an intent to efficiently meet the . 
requirements for da.ta ga.thering. Yet, these techniques may or m^y not be 
effective in, providxr^g the required data. 

The main objective of this study was to compare two accepted sam- 
pling techniques and determine V7hich method vrould provide the best esti- 
mate of a library usage parameter. The first sairxpling- technique examined 
was a pure random sampling method, and the second. was a stratified^ random 
sampling- technique. ^ - 

The Tlieoretical Samrpling Mod.el 

One of the sampling techniques selected to estimate the library usage 
parameter was the pure random sanrpling r.^ethod (Dixon & Massey,. 19^9)^ As 
applied to. this problem, the technique was one in v/hich the particular 
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.sonester days selected for estlrnr-jting the parameter v/ould "be chosen at. 
random and without repla-ccnient. This particular pj.iT^:pling technique v/as 
chonen for exc:iT».i nation 'occau-se it is sirrplo to employ^ it is free of bias 
(when properly used)^ and it is a widely used saivpling }Ticthod, The pure 
random sajripling method is based on a theoretical model which requires some 
elaboration. 

AssuJiie that a number of equal- sij^ied sexnplos of seines ter days is 
dravm (v/ithout replacement) from the population of calendar days in one 
semester. For each of the days selected in each cample, a number is 
obtained corresponding to the total number of patrons utilizing the libra- 
ry for that ]particvLl.ar day. The distribution of the means of each of 
these samples is assui^ied to be normally distributed and hns a standard 
deviation. This standard deviation is knovrn as the standard error of the 
mean and is represented by the follovjing equation, 

/S,D, of X - 

VJhe.re 5,D, of x = standard error of the mean, . 

d - the population variance: in this case^ the variance of 
the daily nui'nber of patrons utilizing the library for 
one semester. 

N = the size of the T)OT)ulation; in this, case the total 
^ number of days the library is open during the semester. 

IJ = sise of the sample; in this case, the total nimiber of 

days chosen for sampling the number of patrons "^utili zing 
the library, . • 

Random sampling can be effectively used in conjunction with the above 
mathematical relationship to provide estimates of library usage parameters 
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\rell as confidence regions around thor^e cs-oimftted parameters. It 
should he noted that the sampling procedui"'e itself would yield the esti- 
]Tiritos of the parnjiieters, whereas, the above nia.thematical relo.tionship 
could he used to provide the confidence regions around these estiimted 
paraaneters. 

Methods 

Tlie library usage parameter (the meen number of pa.trons utilizing 
a university librci,ry daily during one sejnester) estimated by these sam- 
pling techniques is raathematically expressed as follov/s: ^ 

■ ■ ■ ^^p, ■ ■ ■ ■ 

T'Jhere ^ the mean niunber of patrons .utilizing the library daily during 
one semester, 

X = the nujmber of patrons utilizing the library for any given day 
diiring the semester. Hiis term i-'; then suiTimed over all d'jys 
of the semester. 

= the .total nujuber oj? days during the semester (Kp = 112) « 

- * ■ ' . 

Data wore collected on the actual munber of patrons utilizing the 
library each day for one semester. Tlais count was made on a continuous 
basis by personnel assigned to library -exits v/ho had been instructed to - 
record^ V7ith a counting device, the nu^Tiber of patrons exiting the library.- 
Tlie mea.n number of daily patrons utilizing the library during the semester 
was found to be lUl6; the standard deviation vras found to be 739- 

In suiTHTiary, then, the parameter or population to be estimated by the 
two sampling techniques was the menn number of daily patrons utilizing 
the library dui'^ing the sem.ester, and as' stated above, this value was com- 
puted beforehand a.nd was found -to be lHl6. 
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The distribution for the tbooretical snnipling model is presented in 
Table !• It contains the orpected error specifying the confidence regions 
for various s.'tsniple sizes at the 68yj and 95^0 confidence levels. The mean . 
nwiiber of daily pa.tron5 (l!4l6) utilising the library diu^ing the semester 
as vrcll r^s the standard deviation' (739), the population size (11^), snd 
the saanple size (30) vere used to derive these estimates, i.e,, the appro-- 
priate values v^crp substituted into the equation exp^lained f^.bove, Ihj.s 
provided the oqpected error for confidence intervals of SQ'/o and 95/^ ^^r 
each of the sajiiple sizes listed in Table 1, For exaiirple, if the. sainplo 
siae were 55, >^e would expect that 68 times out of 100 the true volue of 
the e:Stlmated parameter vrould' fall within "i^'lO'i- ui'iits of the estimted 
value of the parameter j also, 95 times out of 100 the true value of the 
estimated parameter would fall within - PO8 units of the estimated value 
of the parameter* 

Thus, if one knew the population variance, the population size (e.g-, 
number of semester days) and chose a particular sample sise, then confidence 
.regie .as around the parameter (estiy^vated by the random sampling method) 
could be obtained. In practice, the only variable that would be left uii- 

^ specified after one sample of 35 (or s.ny other saoiiple size that might be 

\ ' ' 

I , ■ . 

selected) had been taken and the parameter estimated .would be the popu- 
lation variance. However, if the sample size is 50 or more, the variance 
of the elements of the sample vrould. closely approximate the population 
variance. On the other ha«nd, if the sample size is substantially less 
than 50, the population v3.riance could be estimated by using previously 
collected da.ta if it were available. For example, if one wanted to esti- 
mate the previously referred to parameter -using a sample size of much less 
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than 50, it miglit *be necessary to obtain the variance of the n-uiiibcr of 
patrons utili&sing the .library during some previous semester to e.^tir;iate 
the popu].ation variance. Of coiu\se, ■ thi;^; xg predicated upon the as.'Sin-ap- 
tion that the variance vrould not differ suh^Vcantially from senicister to 
semester. In inan;y^ inntaiices, this could be a tenuous ^s^ut'aption. At 
any rate, a decision would have to be made vrhich V70uld involve vreighing 
the accuracy of a small scnple size versus the biased estiiriate that could 
occur by using the variance of a previous senester. 

It was determined that 28 days vrould constitute a satisfactory sam- 
ple frora the total nuraber of cuays the libraiy vras open diuring the semester,, 
■ Once this decision was reached and the procedures described earlier had 
been arranged, then inuch of the ground vrork had been estf^blished for insti- 
tuting a procedure which compared the effectiveness of two ssorpling tech- 
..nicjues. 

Figure 1 contains the sampling distribution of sample T>:eans (of 
number, of patrons utilising the library daily for one semester) t'nat were 
obtained ompj.rically> 'Ihis v/as accomplished by drawing without replci.cement, 
30 independent random samples of ?8 semester d.^ys from the population of 
112, semester days. The ordina.te of Figiore 1 contains the actual fre- 
quency with which each of the so.iTiple means fell within the specified 
interval of values indicatea along the abscissa. The size of the interval 
was 50 units. This "interval width was selected for it vras felt that it 
would yield the most accui*ate visual representation of the data. It 
should be noted that the empirical distribution approximates a normal 
distribution". vMso, the values bend to distribute themselves about the 
true. value of the parameter, The mean of the ej'npirical distribution was 

erJc 
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1580 and the standard deviai;icn v,o.s Iho. The expected mean oV this dis~' 
tribution would be lhl6 \'failc the orpected standard deviation v/ould be 
1?1 (sec Table 1). Thus^ this empirically derived soT.ipling distribution 
provided an accirrate representation of a sairipling distribution ttot night 
be obtained by using 50 such random nanrpling.'S vith each scmplc constitu- 
ting 2d SGUOGter days: the values of the moan and standard deviation of 
this empirical distr?lbution confora-ad ra,ther vrell to the theoretical dis- 
tribution. 

Figure ? contains the r^anrpling distribution of ineans that were 
obte.ined using tlie rrarae data^ the same saim;)le sise^ the same nujiiber of 
saj^^ples^ but a slight .niodification of the previous sajiipling method. This 
jnodlfication took into accouiat the effects that v3.ricus d?iys of the week 
inight have on patron utilization of the library. (it should be obvious 
that other variables a.lso might significantly affect SEimpling results, ) 
Tlie mean of this distribution vras 139? and the standard deviation was 
It can be seen that the sai;iple e/?timates, in gGnorG,l_, more nearly approxi- 
mated tlio true value of the parai^ietor in this case' than by the method. 

shown in Figure 1. Tlie mean error (obtained by sujiiriiing the absolute 

i' 

values of each of the sc-iiiiple errors and dividing by the number of sanrpler.) 
in estimating the parameter by this method v^a;- 77. The niean error over ^,11 
samplings in estiinating the paranieter by the previous method wos llh. The 
difference between these tv;o values was statisticaGXy significant (p<,01). 
More inrportantD-y;, the reduction in the merm error v;as 3P,5^ Therefore, 
some elaboro.tion of this modified version of the previous sarrtpling tech- 
nique is- in order. 

The saaopling technique that wns eriTr;loyed in Figirc^e -2 is kno^.-m as 
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observed that the )'iU:i":bGr of p-?.ti'onG utilJ.sing the llbiv^J.r-y I'or certain 
weekdays (e,g,^ S.ntavdoyG and Sundays ) vas at grer:t va,riance fron) the 
nuinbcr of patrons utlli?;ins the library for otlier veekdnyG, Tnereforo, 
the sample \n'S selected in .^uch a vay that each day of the veelc vas in- 
cluded foux time::; in each of the P8 day saraple?;, Tnu5^ stratif ica.tion 
insured that ea.cli day of the vreek va^: reprovsonted a.n egual number of 
timca in the f?ainplo although rach day included in the sample vas selected 
randojiily for each of the seven strata, - 

Conclu aionrj 

A theorotjcal model ha,s been presc^nted tha.t niakes it possible to 
estiuiirate confidence regions, Tliis inodel v?f^.s baned on a randoiTi S2.mp].ing 
rricthod vrhich did not involve stratification^ hov/ever^ the identics.l pro- 
cedure for estimating confidence regions can be used to estima.te confi- 
dence regions foK* the stratified random sampling method, . . . 

Hue Gonfide-nce regions that vrould result by using this procedure if 
the stra.tified random 5:a:irpling design v;ere employed vrould be G>rpected to 
be somewhat v/ider tfcin the actual confidence regions; i,e., the parameter, 
estimates would bo more precise than the width of the estimated confidence 
region would indicate." This presents no major . problem in that interpre- 
tations of parameter estimates based on wider confidence regions would , 
consequent?-y tend to be more cautious tban interpretations based on naa^rovrer 
confidence regions/ Tne fact. remains that such parameter estimates are 
genera?Lly more precise if a, stratified rc^ndom. design is properly used 
instead, of the purely random design* . , ■ , 
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Tne brief exciLiination and coinrjarison •df two saaiipling techniques 
demoiiotratos f\ie increased precision thut can occui'^^ from co.rc-ful con- 
sideration of the cho.racteristics of the elements that -^are "being measiu'cd. 
For example, the o"bso:cvotion that gre^-.vb varia.bility existed in the daily 
patron u?5age of the libra.ry suggested that, the r.ample shoul.d be strati.- 
fied rxnd that a fixed portion of the sample should be ti:Jken froin eoch of 
the strata. This procedure ensured that the prop'ortlon of the sa^pple in 
each of the seven strata, was the sojne oS the proportion of the populo.tion 
in each of the seven strata. If the stratified random sanrpling design 
is to be used correctly, it is necessary that the proportion of the sample 
in each of the strata be the same as the propor.tion of the population 
in that strf^.ta. 

Careful consideration must be given to the idiosyncrasies, habits, 
a.nd makeup of the elements of the po'pulation that is sampled. Such con- 
sideration should jjirprove the utilization of sanrpling procedures, thereby 
yielding more precise estimates of library param.eters. 

■ . ■ ( . ■ ■. 
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